Recovering all generalized order-preserving submatrices: new exact formulations and algorithms
نویسندگان
چکیده
Cluster analysis of gene expression data is a popular and successful way of elucidating underlying biological processes. Typically, cluster analysis methods seek to group genes that are differentially expressed across experimental conditions. However, real biological processes often involve only a subset of genes and are activated in only a subset of environmental or temporal conditions. To address this limitation, Ben-Dor et al. (2003) developed an approach to identify order-preserving submatrices (OPSMs) in which the expression levels of included genes induce the sample linear ordering of experiments. In addition to gene expression analysis, OPSMs have application to recommender systems and target marketing. While the problem of finding the largest OPSM is N P-hard, there have been significant advances in both exact and approximate algorithms in recent years. Building upon these developments, we provide two exact mathematical programming formulations that generalize the OPSM formulation by allowing for the reverse linear ordering, known as the generalized OPSM pattern, or GOPSM. Our formulations incorporate a constraint that provides a margin of safety against detecting spurious GOPSMs. Finally, we provide two novel algorithms that iteratively solve mathematical programming formulations to global optimality to recover, for any given level of significance, all GOPSMs from a given data matrix. We demonstrate the computational performance and accuracy of our algorithms on real gene expression data sets showing the capability of our developments. Andrew C. Trapp Foisie School of Business, Worcester Polytechnic Institute, 100 Institute Rd., Worcester, MA 01609, USA Tel.: +1-508-831-4935 E-mail: [email protected] Chao Li Department of Computer Science, Worcester Polytechnic Institute, 100 Institute Rd., Worcester, MA 01609, USA Patrick Flaherty Department of Mathematics and Statistics, University of Massachusetts, Amherst, MA 01003, USA 2 Andrew C. Trapp et al.
منابع مشابه
A New Approach for Mining Order-Preserving Submatrices Based on All Common Subsequences
Order-preserving submatrices (OPSMs) have been applied in many fields, such as DNA microarray data analysis, automatic recommendation systems, and target marketing systems, as an important unsupervised learning model. Unfortunately, most existing methods are heuristic algorithms which are unable to reveal OPSMs entirely in NP-complete problem. In particular, deep OPSMs, corresponding to long pa...
متن کاملTowards Scalable Algorithms for Discovering Rough Set Reducts
Rough set theory allows one to find reducts from a decision table, which are minimal sets of attributes preserving the required quality of classification. In this article, we propose a number of algorithms for discovering all generalized reducts (preserving generalized decisions), all possible reducts (preserving upper approximations) and certain reducts (preserving lower approximations). The n...
متن کاملFiltration Algorithms for Approximate Order-Preserving Matching
The exact order-preserving matching problem is to find all the substrings of a text T which have the same length and relative order as a pattern P . Like string maching, order-preserving matching can be generalized by allowing the match to be approximate. In approximate order-preserving matching two strings match if they have the same relative order after removing up to k elements in the same p...
متن کاملExtending the Order Preserving Submatrix: New patterns in datasets
This paper concerns in finding local patterns in gene expression datasets. We present new order relation patterns, and develop algorithms which finds those pattern. Our algorithms are the first algorithms to find the exact results for those patterns, yet in most cases they outperforms existing heuristical algorithm. Finally we present an algorithm for the broader problem of frequent itemset min...
متن کاملHeuristic and exact algorithms for Generalized Bin Covering Problem
In this paper, we study the Generalized Bin Covering problem. For this problem an exact algorithm is introduced which can nd optimal solution for small scale instances. To nd a solution near optimal for large scale instances, a heuristic algorithm has been proposed. By computational experiments, the eciency of the heuristic algorithm is assessed.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Annals OR
دوره 263 شماره
صفحات -
تاریخ انتشار 2018